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Abstract 

We describe a general strategy for sampling configurations from a given distribution, not based on the standard 
Metropolis (Markov chain) strategy. It uses the fact that nontrivial problems in statistical physics are high 
dimensional and often close to Markovian. Therefore, configurations are built up in many, usually biased, steps. 
Due to the bias, each configuration carries its weight which changes at every step. If the bias is close to optimal, all 
weights are similar and importance sampling is perfect. If not, "population control" is applied by cloning/killing 
partial configurations with too high/low weight. This is done such that the final (weighted) distribution is unbiased. 
We apply this method (which is also closely related to diffusion type quantum Monte Carlo) to several problems 
of polymer statistics, reaction-diffusion models, sequence alignment, and percolation. 
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1. Introduction 

Although Markov chain (Metropolis-type) 
Monte Carlo (MC) simulations dominate in statis- 
tical physics today, simulations not based on this 
strategy have been used from early times on. Well 
known examples are evolutionary (in particular 
genetic) algorithms [1], diffusion type quantum 
MC [2], and several algorithms devised for the 
simulation of long chain molecules [3-7] . 

As these methods were developed independently 
in different communities, it was not generally rec- 
ognized - or rather forgotten - that most of them 
are realizations of a common strategy, as pointed 
out by Aldous and Vazirani [8] who also coined the 
name "go with the winners" . But essentially the 
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same basic strategy was already discussed as a gen- 
eral purpose sampling method by Herman Kahn 
in 1956 [9] who called it "Russian Roulette and 
Splitting" , and attributed it to unpublished work 
by von Neumann and Ulam. For further applica- 
tions of this strategy see [10-12]. The last two ref- 
erences also discuss applications in lattice spin sys- 
tems and Bayesian inference, fields which will not 
be treated in the present review. 



2. The Basic Strategy 

2.1. Sequential Importance Sampling 

As in any MC method, we draw configurations x 
from some distribution p{x) . Writing the partition 
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sum as 

M 

Z = Y, e-^"^^^ - M-^ e-^^^^"'>/p{^n, (1) 

X a — 1 

we can interpret this as each configuration having 
its weight W{x) = e~^^^^^ /p{x). In importance 
sampling we try to choose p(x) oc e~^'^^^\ so that 
all weights become equal. 

We now assume that we can break up the con- 
struction of a configuration into N single steps 
XmiT- — 1,2, ... N . For a polymer, the n-th step 
would e.g. be the placement of the n-th monomer. 
Then the weight W is obtained recursively as = 
Wn, Wo = 1, 

^t3{E{xi,...x„)~E{xi,...x„-i)) 

Wn = Wn-1 ^ ^ . (2) 

Pn \'^n \ Xi . . . Xyi— 1 j 

In statistics this is called sequential importance 
sampling (SIS) [12]. 

In some cases, "natural" values for the Pn{xn) 
are easy to guess. In the Rosenbluth method [3] for 
simulating a self-avoiding walk (SAW), e.g., one 
chooses uniformly among the free neighbours. But 
this is not optimal, a better choice is provided by 
Markovian anticipation [13]. In general, for choos- 
ing p„(x„|xi . . . Xn^i) one has to depend on heuris- 
tics, except in the case of diffusion quantum MC 
where perfect importance sampling {Wn = 1) is 
possible if the ground state wave function is known 
[2,14]. Specific choices will be discussed together 
with applications. 



2.2. Population Control 

The main drawback of SIS is that the distribu- 
tion of weights can become extremely wide. If long 
range correlations are weak (as e.g. for SAWs), 
logW^TV is roughly a sum of independent terms. 
This suggests the following strategy: 

If at any step n the weight Wn is above a suit- 
ably chosen threshold W^ , we make an additional 
copy of the configuration xi, . . . Xn, and give both 
copies the weight Wn/2. Both are then grown inde- 
pendently (with eventual later copyings) up to full 



length [^. In this way high weights are suppressed 
and precious "good" configurations are less likely 
to be lost entirely by bad subsequent moves. In [4] 
a similar strategy (but not based on weights) was 
called 'enrichment'. 

On the other hand, if Wn falls below another 
threshold W~ , we draw a random number r G 
[0, 1]. If r < 1/2 we kill the configuration and start 
a new one. If r > 1/2 we keep it and double its 
weight. 

Obviously, for any choice of the thresholds, nei- 
ther the cloning nor the pruning introduce any ad- 
ditional bias. Thus we can, in principle, use any 
choice for Wn and W~ , and we can change them 
ad libitum during the simulation. Bad choices will, 
however, lead to inefficiency, just as do bad choices 
for pnix). 

Except at very low temperatures where special 
care is needed [15,16] we found the following strat- 
egy to be sufficient: 

- For the first configuration(s) we do not clone at 
all and kill only if the weight is exactly zero. 

- If we have already m previous configurations 
which had reached size > n, we estimate 
from them the partition sum Z„ w Z„ = 
TO-i EaLi T^n(x"). We then set W^ = C^Zn 
with C+ « 1/C- « 0(1) - 0(10). 

2.3. Depth First Versus Breadth First 

As described above, the algorithm is most effi- 
ciently implemented in a depth first fashion, and 
as such was called PERM (pruned-enriched Rosen- 
bluth method) in [7] . In a depth first approach [17] , 
we follow one copy until its end before we take up 
the other copy. In breadth first search, on the other 
hand, we treat all copies in parallel and handle the 
n-th steps of all copies before we go to n -I- 1. 

Evolutionary algorithms [1] are usually imple- 
mented breadth first. One puts up a population of 
M replicas which are evolved simultaneously, and 
population control is exercized such that M stays 
constant during the evolution. The same is true for 



^ In some cases (e.g. at low temperatures, where Boltz- 
mann factors are huge), it might be necessary to make 
several copies and to distribute the weight evenly among 
them. 
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most implementations of the "go with the winners" 
strategy. This has several advantages: 

- Breadth first approaches are well adapted for 
massively parallel computers. One simply puts 
one configuration on each processor. 

- One has no problem with keeping the number of 
replicas constant. 

- One can use more general population control 
strategies [12]. 

But the last two points seem minor in most appli- 
cations we have studied. On the other hand, the 
main advantage of depth first is the elegance and 
efficiency of the codes. The easiest implementation 
is by means of recursion (for a pseudocode see [7] ) . 
Copies (or rather instructions to make copies) are 
then put on a stack which is maintained automat- 
ically if recursive function calls are used. Storage 
use is minimized (as only a single copy and its his- 
tory is kept in memory), and communications are 
also less than in breadth first. 



3. Multiple Spanning Percolation Clusters 

Let us consider percolation on a large but fi- 
nite rectangular lattice in 2 < d < 6. We single 
out one direction as "spanning" . In this direction 
boundaries are open, while periodic b.c. are used 
in the other direction(s). For a long time it was 
believed that there is at most one spanning clus- 
ter (which touches both open boundaries) in the 
limit of large lattices, keeping the aspect ratio fixed 
{Li = XiL, L oo, i = 1, . . . d). 




Fig. 1. Configuration of 5 spanning site percolation clus- 
ters on a lattice of size 500 X 900. Any two clusters keep a 
distance of at least 2 lattice units. Lateral boundary condi- 
tions are periodic. The probability to find 5 such spanning 
clusters in a random disorder configuration is fa lO"^'^. 



Since there is no spanning cluster for subcritical 
percolation and exactly one in supercritical, the 
only relevant case is critical percolation. There it 
is now known that the probabilities to have 
exactly k spanning clusters are all non-zero in the 
limit L — > cxD. In d = 2 they are known exactly 
from conformal invariance, but for d > 3 no exact 
results are known. But there is a conjecture by 
Aizenman [18], stating that for a lattice of size L x 
. . . X L X (rL) {rL is the length in the spanning 
direction) Pk ~ e^"'' with 

a cx fc'i/(''-i) for fc > 1. (3) 

For d — 2 one has a ~ fc^, in agreement with 
Eq. (3). A generalization in d = 2 consists in de- 
manding that clusters are separated by at least q 
paths on the dual lattice [19]. In that case, and for 
periodic transverse b.c, 

§[((9 + 1)^)'-!] • fc>2,d = 2 (4) 

In order to test Eqs. (3), (4) for a wide range of 
values of k and r, one has to simulate events with 
tiny probabilities, InP^ ^ —10^ to —10^. It is thus 
not surprising that previous numerical studies have 
verified Eq. (4) only for small values of k, and have 
been unable to verify or disprove Eq. (3) [20,21]. 

To demonstrate that such rare events can be 
simulated with PERM, we show in Fig. 2 a lat- 
tice of size 500 x 900 with 5 spanning clusters 
which keep distances > 2. Eq. (4) predicts for it 
Pk = exp(— 3367r/5) w 10~^^. This configuration 
was obtained by letting 5 clusters grow simultane- 
ously, using a standard cluster growth algorithm 
[22], from the left border. Precautions were taken 
that they grew with the same speed towards the 
right, i.e. if one of them lagged behind, the growth 
of the others was stopped until the lagging clus- 
ter had caught up. If one of them died, or if two 
came closer than two lattice units, the entire con- 
figuration was discarded. If not, it was cloned if 
the weight Wn exceeded 3Z„. Note that here the 
growth was made without bias, and therefore no 
pruning was necessary. 

In this way we could check Eq. (4) with high 
precision, proving the correctness of our algorithm. 

More interesting is the test of Eq. (3) for d = 3. 
Simulating up to 16 parallel clusters on lattices of 
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sizes up to 128 x 128 x 2000 (leading to probabilities 
as small as 10~^°°!) gave perfect agreement with 
Eq. (3) [23]. 



4. Polymers 

One of the main applications of the go-with- 
the- winners strategy is configurational statistics of 
long polymer chains. For a breadth first algorithm 
which otherwise is very similar to PERM see [6] . 

4.1. O- Polymers 

PERM is particularly efScient near the so-called 
'theta-' or coil-globule transition. According to the 
generally accepted scenario, the theta-point is tri- 
critical with upper critical dimension = 3 [24]. 

At Tff, bias correction and Boltzmann fac- 
tors nearly cancel in = 3. Therefore, polymers 
have essentially random walk configurations with 
small (logarithmic) corrections. Therefore, a non- 
reversing random walk (U-turns are forbidden) 
for SIS is already sufficient to give good statistics 
with very few pruning and enrichment events. In 
[7] chains made of up to 1,000,000 steps could be 
sampled with high statistics within modest CPU 
time [7]. They were done in finite volumes ("dense 
limit") and verified that the 0-point indeed is a 
second order transition. The most precise verifica- 
tion of logarithmic corrections came from chains 
with N = 10, 000 in infinite volume. The devia- 
tions from random walk behaviour turned out to 
be much stronger than the leading-log corrections 
predicted from the renormalization group [25], 
but agreement improves substantially when higher 
order corrections are included in the latter [26] . 

4.2. Critical Unmixing 

A related problem is the unmixing of semidilute 
polymer solutions. For any finite chain length N 
this is in the Ising universality class. But in ad- 
dition to the Ising scaling laws, there are further 
universal scaling laws for parameters and ampli- 
tudes which, from the Ising point of view, would be 
non-universal. In particular, the critical tempera- 
ture should approach Tg when N ^ oo, Tc — Tg ~ 



N and the critical monomer concentration 
should tend to zero, 



-1/2 



(5) 



The exponents here are mean field, appropriate 
for d = 3. Indeed one should also expect loga- 
rithmic corrections [25] . Previous experiments had 
suggested an exponent 0.38 ± 0.01 in Eq. (5). This 
would be very hard to understand and has stirred a 
lot of theoretical activity (for a review see [29,27]). 
Simulations using PERM [27] showed that this is 
wrong: The deviations from Eq. (5) can be under- 
stood most easily as logarithmic corrections. 

4.3. DNA Melting 

DNA in physiological conditions forms a double 
helix. Changing the pH value or increasing T can 
break the hydrogen bonds between the base pairs, 
and a phase transition to an open coil occurs. Ex- 
periments suggest it to be first order [35] . While a 
second order transition would be easy to explain 
[36,37], no previous model had been able to give a 
first order transition. 

The model studied in [28] lives on a simple cu- 
bic lattice. A double strand of DNA with length 
is described by a diblock copolymer of length 2A'', 
made of N monomers of type A and N monomers 
of type B. All monomers have excluded volume in- 
teractions, i.e. two monomers cannot occupy the 
same lattice site, with one exception: The fc-th A- 
monomer and the k-th B-monomer, with k being 
counted from the center where both strands are 



N=500 . 

N=1000 ■ 

N=1500 > 
N=2000 

N=2500 > 

N=3000 > 



0.4 

n/N 



Fig. 2. Histograms of the number of contacts, for single 
strand length N = 500, . . . 3000, at e = ec. On the horizon- 
tal axis is plotted n/N as is appropriate for a first order 
transition. 
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joined together, can occupy the same site. If they 
do so, then they even gain an energy — e. This mod- 
els the binding of complementary bases. 

The surprising result of simulations of chains 
with N up to 4000 is that the transition is first or- 
der, but shows finite scaling behaviour as expected 
for a second order transition with cross-over expo- 
nent (j) = 1. To demonstrate this, wc show in Fig. 4 
energy histograms for different chain lengths. One 
sees two maxima, one at n = and the other at 
n ~ N/2, whose distance scales proportionally to 
A^. But in contrast to usual first order transitions 
the minimum in between docs not deepen with in- 
creasing TV. This is due to the absence of any anal- 
ogon to a surface tension. The same conclusion is 
obtained from specific heat data and from end-to- 
end distances [28]. 

In [28] we also studied similar models with (par- 
tially) switched ofi^ excluded volume effects. They 
show that excluded volume is the main force mak- 
ing the transition first order, as also confirmed by 
subsequent analytic calculations [38] . 

4.4. Native Configurations of Toy Proteins 

Predicting the native (« ground) states of pro- 
teins is one of the most challenging problems in 
mathematical biology [39] . It is difficult because of 
the many local energy minima. 

In view of this, there exists a large literature 
on finding ground states of artificially constructed 
heteropolymers. Most of these models are formu- 
lated on a (square or simple cubic) lattice and 
use only few monomer types. The best known ex- 
ample is the HP model of K. Dill [40] which has 
two types of amino acids: hyrophobic (H) and hy- 
rophilic (polar, P) ones. With most algorithms, one 
can find ground states typically for random chains 
of lengths up to ~ 50. 

In [16] we used PERM to study several se- 
quences, of the HP model and of similar models, 
which had been discussed previously by other 
authors. In all cases we found the known lowest 
energy states, but in several cases we found new 
ones. A particularly impressive example is a chain 
of length 80 with two types of monomers, con- 
structed such that it should fold into a bundle of 
four "helices" with an energy —94 [41]. Even with 



a specially designed algorithm, the authors of [41] 
were not able to recover this state. With PERM 
we not only found it easily, we also found several 
lower states, the lowest one having energy —98 
and a completely different structure. 

4.5. Miscellaneous 

Applications of PERM to other polymer prob- 
lems are treated in [15,30,13,31-34]. For problems 
with open coils, a bias strategy called Moarko- 
vian anticipation in [13] worked very well. Integrat- 
ing over the disorder, we recently could also map 
a biased random walker in the presence of ran- 
dom traps onto a stretched collapsed polymer [42] . 
Without bias, the transition from the finite time 
Rosenstock to the asymptotic Donsker-Varadhan 
behaviour is in 3d akin to a cross-over in a first or- 
der phase transition. With bias, the delocalization 
(globule-stretch) transition is first order in > 2 
[42]. 



5. Lattice Animals (Randomly Branched 
Polymers) 

Consider the set of all connected clusters of n 
sites on a regular lattice, with the origin being 
one of these sites, and with a weight defined on 
each cluster. Lattice animals are defined by giv- 
ing the same weight to each cluster. This distin- 
guishes them from percolation clusters where the 
weight depends on the 'wetting' probability p. In 
the limit p ^ this difference disappears, and the 
two statistics coincide. It is believed that lattice 
animals are a good model for randomly branched 
polymers [43]. While there existed no efficient al- 
gorithm for estimating the animal partition sum 
there exist very simple and efficient Leath-type [22] 
algorithms for percolation clusters. 

Our PERM strategy [14,44] consists in starting 
off to generate subcritical percolation clusters by a 
(breadth first) Leath method, re- weighing them as 
animals while they are still growing, and in making 
clones of "good" ones. Since we work at p < pc, we 
do not need pruning. The threshold W+ for cloning 
is chosen such that it depends both on the present 
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Fig. 3. A typical lattice animal with 8000 sites on 
the square lattice. 

animal weight and on the anticipated success for 
further growth. 

In this way we obtained good statistics for ani- 
mals of several thousand sites, independent of the 
dimension of the lattice. A typical 2-d animal with 
8000 sites is shown in Fig. 5. We also simulated 
animal collapse (when each nearest neighbor pair 
contributes — e to the energy), and animals near an 
adsorbing surface [44] . 



6. Error Estimates and Reliability Tests 

Statistical errors can be estimated as usual by di- 
viding a long run into several bunches, computing 
averages over each bunch, and studying the fluctu- 
ations between them. For PERM the situation is 
indeed rather easy, since each tour (set of all con- 
figurations generated by cloning from one common 
ancester) is independent of any other. 

To check for excessive fluctuations in weights W 
of entire tours, we make a histogram on a loga- 
rithmic scale, P(log(W^)), and compare it with the 
weighted histogram W P{\og{W)). If the latter has 
its maximum for values of log(H^) where the for- 
mer is already largo (i.e. where the sampling is al- 
ready sufficient), we are presumably on the safe 
side. However, if WP{\.og{W)) has its maximum 
at or near the upper end of the sampled range, wo 




-12 -10 -8 -6 -4 -2 2 -20 -1 5 - 1 -5 5 

log W log W 



Fig. 4. Full lines arc histograms of logarithms of tour 
weights, normalized as tours per bin. Broken lines show 
the corresponding weighted distributions, normalized so as 
to have the same maximal heights. Weights W arc only 
fixed up to a /3-dependent multiplicative constant. While 
the left panel suggests a reliable simulation, the right one 
was indeed wrong (from ref. [32]). 

should be skeptical. 

In Fig. 6 we illustrate this with two figures taken 
from [32]. While the left panel gave rise to correct 
results, the right one did not. 



7. Conclusion 

We have seen that MC simulations not following 
the Metropolis scheme can be very efficient. We 
have illustrated this with a wide range of problems. 
Conspicuously, the Ising model was not among 
them. It simply would be very hard to beat, say, 
the Swendsen-Wang algorithm. In principle, the 
go-with-the-winners strategy has as wide a range 
of applications as the Metropolis scheme. Its only 
requirement is that instances (configurations, his- 
tories, ...) are built up in small steps, and that the 
growth of their weights during the early steps of 
this build-up is not too misleading. 

The method is not new. It has its roots in algo- 
rithms which have been regularly used for several 
decades. Some of them, like genetic algorithms, 
are familiar to most scientists, but it is in general 
not well appreciated that they can be made into 
a general purpose tool. And it seems even less ap- 
preciated how closely related are methods devel- 
oped for quantum MC simulations, polymer simu- 
lations, and optimization methods. I firmly believe 
that this close relationship can be made use of in 
many more applications to come. 

Among these are significance tests for sequence 
alignment, where one needs large samples of ran- 
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dom pairs of seqences in order to check whether an 
observed alignment is significant. Instead of really 
drawing random pairs, one can use PERM to draw 
biased pairs which are more similar than random 
ones, enhancing thereby the interesting high-score 
region [45]. 

Another application is to epidemic models 
where one can follow the fate of epidemics which 
have a very low chance of survival since, e.g., they 
started in a very hostile environment which they 
first have to adopt to their needs. Here simula- 
tions with PERM [46] allowed to verify with very 
high statistics the claim of [47] that no power laws 
result, in contrast to previous suggestions. A fi- 
nal application to a toy 'population' model [48] is 
discussed in [14]. 
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